Real-time gpu rendering with performance guaranteed power management

ABSTRACT

Systems, apparatuses, and methods for performing real-time video rendering with performance guaranteed power management are disclosed. A system includes at least a software driver, a power management unit, and a plurality of processing elements for performing rendering tasks. The system receives inputs which correspond to rendering tasks which need to be performed. The software driver monitors the inputs that are received and the number of rendering tasks to which they correspond. The software driver also monitors the amount of time remaining until the next video synchronization signal. The software driver determines which performance setting will minimize power consumption while still allowing enough time to finish the rendering tasks for the current frame before the next video synchronization signal. Then, the software driver causes the power management unit to provide this performance setting to the plurality of processing elements as they perform the rendering tasks for the current frame.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.16/457,179, now U.S. Pat. No. 11,100,698, entitled “REAL-TIME GPURENDERING WITH PERFORMANCE GUARANTEED POWER MANAGEMENT”, filed Jun. 28,2019, the entirety of which is incorporated herein by reference.

BACKGROUND Description of the Related Art

Various applications rely on the real-time rendering of images or videocontent. For example, cloud gaming, virtual reality, and gamingspectatorship are examples of applications which involve real-timerendering of content. Real-time rendering of video frames often usessignificant amounts of processing resources that consume a large amountof power. In a real-time rendering environment, the requirement tocontrol generated image frame latencies and a desire to avoid missedframes places special demands on power management. On one hand, it isdesirable to run at the highest clock rate possible to minimize thelatency and guarantee the rendering of images finishes on time. On theother hand, if the processing hardware begins to overheat or nears athermal threshold, the hardware will reduce its clock rate which thenresults in missed frames. These issues are particularly challenging forpower or thermally constrained platforms.

Various frame-based real-time applications include gaming applicationsas well as other types of rendering applications that submit multiplejobs per frame and repeat this process at a constant or variable framerate. The per-frame processing unit workload (e.g., number of jobs, timeper job, resources per job) can vary in complexity and computationaldemand depending on the application runtime behavior. For suchapplications, the processing unit either completes frame executionsufficiently early to allow timely use of the frame (e.g., display ortransmission), or the processing unit is late in completing frameexecution, which results in the frame being dropped or consumed late.Such delays negatively impact the user experience.

In view of the above, improved methods for managing real-time videorendering with performance guaranteed power management are desired.

BRIEF DESCRIPTION OF THE DRAWINGS

The advantages of the methods and mechanisms described herein may bebetter understood by referring to the following description inconjunction with the accompanying drawings, in which:

FIG. 1 is a block diagram of one implementation of a computing system.

FIG. 2 is a block diagram of one implementation of a computing system.

FIG. 3 is a timing diagram of one implementation of selecting aperformance setting for frames being rendered based on queue occupancy.

FIG. 4 is an example of a table for mapping a number of incoming tasksand remaining time to performance setting in accordance with oneimplementation.

FIG. 5 is a generalized flow diagram illustrating one implementation ofa method for performing real-time video rendering with performanceguaranteed power management.

FIG. 6 is a generalized flow diagram illustrating one implementation ofa method for controlling the performance setting for processing hardwarebased on application type.

DETAILED DESCRIPTION OF IMPLEMENTATIONS

In the following description, numerous specific details are set forth toprovide a thorough understanding of the methods and mechanisms presentedherein. However, one having ordinary skill in the art should recognizethat the various implementations may be practiced without these specificdetails. In some instances, well-known structures, components, signals,computer program instructions, and techniques have not been shown indetail to avoid obscuring the approaches described herein. It will beappreciated that for simplicity and clarity of illustration, elementsshown in the figures have not necessarily been drawn to scale. Forexample, the dimensions of some of the elements may be exaggeratedrelative to other elements.

Systems, apparatuses, and methods for implementing real-time GPUrendering with performance guaranteed power management governor based onuse case driven event-based feed-forward control window are disclosedherein. In one implementation, a system includes at least a softwaredriver, a power management unit, and one or more processing elements forperforming rendering tasks. The system receives inputs which correspondto rendering tasks which need to be performed. The software drivermonitors the number of inputs that are received and the number ofrendering tasks to which they correspond. The software driver alsomonitors the amount of time remaining until the next videosynchronization signal. The software driver determines which performancesetting will minimize power consumption while still allowing enough timeto finish the rendering tasks for the current frame before the nextvideo synchronization signal. Then, the software driver causes the powermanagement unit to provide this performance setting to the plurality ofprocessing elements as they perform the rendering tasks for the currentvideo frame.

Referring now to FIG. 1, a block diagram of one implementation of acomputing system 100 is shown. In one implementation, computing system100 includes at least processors 105A-N, control unit 110, input/output(I/O) interfaces 120, bus 125, memory controller(s) 130, networkinterface 135, memory device(s) 140, power supply 145, power managementunit 150, display controller 160, and display 165. In otherimplementations, computing system 100 includes other components and/orcomputing system 100 is arranged differently. Processors 105A-N arerepresentative of any number of processors which are included in system100, with the number of processors varying from implementation toimplementation.

In one implementation, processor 105A is a general purpose processor,such as a central processing unit (CPU). In one implementation,processor 105N is a data parallel processor with a highly parallelarchitecture. Data parallel processors include graphics processing units(GPUs), digital signal processors (DSPs), field programmable gate arrays(FPGAs), application specific integrated circuits (ASICs), and so forth.In one implementation, processor 105N is a GPU which provides pixels todisplay controller 160 to be driven to display 165. In someimplementations, processors 105A-N include multiple data parallelprocessors. In one implementation, control unit 110 is a software driverexecuting on processor 105A. In other implementations, control unit 110includes control logic which is independent from processors 105A-Nand/or incorporated within processors 105A-N. Generally speaking,control unit 110 is any suitable combination of software and/orhardware.

Memory controller(s) 130 are representative of any number and type ofmemory controllers accessible by processors 105A-N. Memory controller(s)130 are coupled to any number and type of memory devices(s) 140. Memorydevice(s) 140 are representative of any number and type of memorydevices. For example, the type of memory in memory device(s) 140includes Dynamic Random Access Memory (DRAM), Static Random AccessMemory (SRAM), NAND Flash memory, NOR flash memory, Ferroelectric RandomAccess Memory (FeRAM), or others.

I/O interfaces 120 are representative of any number and type of I/Ointerfaces (e.g., peripheral component interconnect (PCI) bus,PCI-Extended (PCI-X), PCIE (PCI Express) bus, gigabit Ethernet (GBE)bus, universal serial bus (USB)). Various types of peripheral devices(not shown) are coupled to I/O interfaces 120. Such peripheral devicesinclude (but are not limited to) displays, keyboards, mice, printers,scanners, media recording devices, external storage devices, networkinterface cards, and so forth. Network interface 135 is used to receiveand send network messages across a network. Bus 125 is representative ofany type of bus or fabric with any number of links for connectingtogether the different components of system 100.

In one implementation, queue(s) 142 are stored in memory devices(s) 140.In other implementations, queue(s) 142 are stored in other locationswithin system 100. Queue(s) 142 are representative of any number andtype of queues which are allocated in system 100. In one implementation,queue(s) 142 store rendering tasks that are to be performed for framesbeing rendered. In one implementation, the rendering tasks are enqueuedin queue(s) 142 based on inputs received via network interface 135. Forexample, in one scenario, the inputs are generated by a user of a videogame application and sent over a network (not shown) to system 100. Inanother implementation, the inputs are generated by a peripheral deviceconnected to I/O interfaces 120.

In one implementation, power management unit 150 supplies power frompower supply 145 to components of system 100, and power management unit150 controls various power-performance states of components withinsystem 100. Responsive to receiving updates from control unit 110, thepower management unit 150 causes other components within system 100 toeither increase or decrease their current power-performance state. Invarious implementations, changing a power-performance state includeschanging a current operating frequency of a device and/or changing acurrent voltage level of a device. When the power-performance states ofprocessors 105A-N are reduced, this causes the computing tasks beingexecuted by processors 105A-N to take longer to complete.

In one implementation, control unit 110 sends commands to powermanagement unit 150 to cause processor 105N to operate at a relativelyhigh power-performance state responsive to determining that a number ofrendering tasks for the current frame is greater than a given threshold.In one implementation, the given threshold is adjusted based on theamount of time remaining until the next video synchronization signal.For example, the less time that remains until the next videosynchronization signal, the lower the given threshold is programmed.

In various implementations, computing system 100 is a computer, laptop,mobile device, server, or any of various other types of computingsystems or devices. It is noted that the number of components ofcomputing system 100 varies from implementation to implementation. Forexample, in other implementations, there are more or fewer of eachcomponent than the number shown in FIG. 1. It is also noted that inother implementations, computing system 100 includes other componentsnot shown in FIG. 1 and/or one or more of the components shown incomputing system 100 are omitted. Additionally, in otherimplementations, computing system 100 is structured in other ways thanshown in FIG. 1.

Turning now to FIG. 2, a block diagram of another implementation of acomputing system 200 is shown. In one implementation, system 200includes GPU 205, system memory 225, and local memory 230 which belongsto GPU 205. System 200 also includes other components which are notshown to avoid obscuring the figure. GPU 205 includes at least commandprocessor 235, scheduler 250, compute units 255A-N, memory controller220, global data share 270, level one (L1) cache 265, and level two (L2)cache 260. It is noted that compute units 255A-N can also be referred toherein as a “plurality of processing elements”. In otherimplementations, GPU 205 includes other components, omits one or more ofthe illustrated components, has multiple instances of a component evenif only one instance is shown in FIG. 2, and/or is organized in othersuitable manners. In one implementation, the circuitry of GPU 205 isincluded in processor 105N (of FIG. 1).

In various implementations, computing system 200 executes any of varioustypes of software applications. As part of executing a given softwareapplication, a host CPU (not shown) of computing system 200 launchesrendering tasks to be performed on GPU 205. Command processor 235receives commands from the host CPU and uses scheduler 250 to issuecorresponding rendering tasks to compute units 255A-N. Rendering tasksexecuting on compute units 255A-N read and write data to global datashare 270, L1 cache 265, and L2 cache 260 within GPU 205. Although notshown in FIG. 2, in one implementation, compute units 255A-N alsoinclude one or more caches and/or local memories within each computeunit 255A-N. In various implementations, compute units 255A-N executeany number of frame-based applications which are rendering frames to bedisplayed, streamed, or consumed in real-time. In one implementation,queue(s) 232 are stored in local memory 230. In other implementations,queue(s) 232 are stored in other locations within system 200. Queue(s)232 are representative of any number and type of queues which areallocated in system 200. In one implementation, queue(s) 232 storerendering tasks to be performed by GPU 205.

In one implementation, the performance setting of GPU 205 is adjustedbased on a number of rendering tasks for the current frame stored inqueue(s) 232 as well as based on the amount of time remaining until thenext video synchronization signal. In various implementations, theperformance setting of GPU 205 is adjusted so as to finish the renderingtasks before the next video synchronization signal while also achievinga power consumption target. In one implementation, the performancesetting is adjusted by a control unit (not shown). The control unit canbe a software driver executing on a CPU (not shown) or the control unitcan include control logic implemented within a programmable logic device(e.g., FPGA) or control logic implemented as dedicated hardware (e.g.,ASIC). In some cases, the control unit includes a combination ofsoftware and hardware.

In one implementation, the performance setting of GPU 205 corresponds toa specific power setting, power state, or operating point of GPU 205. Inone implementation, the control unit uses dynamic voltage and frequencyscaling (DVFS) to change the frequency and/or voltage of GPU 205 tolimit the power consumption to a chosen power allocation. Each separatefrequency and voltage setting can correspond to a separate performancesetting. In one implementation, the performance setting selected by thecontrol unit controls a phase-locked loop (PLL) unit (not shown) whichgenerates and distributes corresponding clock signals to GPU 205. In oneimplementation, the performance setting selected by the control unitcontrols a voltage regulator (not shown) which provides a supply voltageto GPU 205. In other implementations, other mechanisms can be used tochange the operating point and/or power settings of GPU 205 in responseto receiving a command from the control unit to arrive at a particularperformance setting.

Referring now to FIG. 3, a timing diagram of one implementation ofselecting a performance setting for frames being rendered based on queueoccupancy is shown. When rendering frames of a video sequence, in oneimplementation, a software driver changes the performance setting of therendering hardware based at least in part on the queue occupancy. Thequeue occupancy refers to the number of rendering tasks that have beenenqueued for the processing hardware (e.g., GPU) for the current framebeing rendered.

A frame period is shown which is bounded by the video synchronizationsignals (or VSync's) corresponding to the start and finish of each framebeing rendered. In the first frame period shown in FIG. 3, the initialperformance setting 325 is set for the processing hardware for the framebeing rendered. The initial performance setting 325 can be a defaultsetting in one implementation. In another implementation, the initialperformance setting 325 is programmable based on the type ofapplication, hints generated by the application, an estimate of thecomplexity of the current frame being rendered, and/or based on otherfactors. In one implementation, the software driver responsible forcontrolling the performance setting monitors the queue occupancy of therendering task queue(s). The software driver monitors the queueoccupancy multiple times per frame period, and the frequency ofmonitoring can be either fixed or programmable depending on theimplementation. As shown in FIG. 3, the first occupancy sample 305specifies that a particular number of rendering tasks have beenenqueued. Based on this sample 305, the software driver maintains thecurrent performance setting 325.

The next queue occupancy sample 310 is a reduction from the previoussample 305. This indicates that the number of rendering tasks hasdecreased due to one or more rendering tasks having been completed bythe processing hardware. Accordingly, in response to detecting thereduction in queue occupancy from sample 310 to sample 305, the softwaredriver reduces the performance setting 330 to decrease power consumptionof the processing hardware. This trend continues for the next twosamples 315 and 320, with the software driver reducing the power forperformance settings 335 and 340, respectively. This reduction in theperformance setting is acceptable since the processing hardware hasfewer rendering tasks to finish for the current frame. When the videosynchronization signal occurs, the current frame is sent to the display,sent over a network to one or more clients, or sent to other locations.

For the next frame period, the first occupancy sample 345 indicates thatthere are relatively few rendering tasks for this frame. Accordingly,the performance setting 340 can remain at a relatively low level for theprocessing hardware at the start of the frame period. The next occupancysample 350 indicates the number of rendering tasks has been reduced,allowing for a lower performance setting 370. However, the subsequentoccupancy sample 355 indicates that the queue occupancy has increased.This can be due to receiving multiple rendering tasks, which can becaused by player inputs in a gaming scenario, user movements in avirtual reality environment, or other inputs generated or events inother types of applications.

When the software driver detects the increase in occupancy for occupancysample 355, as well as a diminishing time remaining until the next videosynchronization signal, the software driver responds by increasing thepower provided to the processing hardware to performance setting 375. Inone implementation, performance setting 375 is the maximum performancesetting for the processing hardware. The next two occupancy samples 360and 365 indicate that the number of rendering tasks for the currentframe have been reduced. However, the time available for finishing theserendering tasks has also decreased, which means that the software driverwill maintain the relatively high performance setting 375 for theprocessing hardware.

The examples shown in timing diagram 300 are indicative of oneparticular implementation for a software driver adjusting theperformance setting based on the queue occupancy of the rendering taskqueue(s). In other implementations, the software driver can make othertypes of adjustments based on changes in the queue occupancy. It shouldbe understood that the granularity at which updates to the performancesetting are made can vary according to the implementation. Also, thefrequency at which the software driver checks the queue occupancy canalso vary according to the implementation.

Turning now to FIG. 4, one implementation of a table 400 for mapping anumber of incoming tasks and remaining time to performance setting isshown. In one implementation, control logic or a software driverperforms a lookup of columns 405 and 410 of table 400 to retrieve acorresponding performance setting. The retrieved performance setting isused to program a plurality of processing elements (e.g., GPU 205 ofFIG. 2) to operate at a specific operating point. In one implementation,column 405 includes different possible values for a number of incomingtasks (e.g., rendering tasks). In other implementations, column 405includes other values which represent the amount of work that needs tobe performed for rendering the current frame. For example, in anotherimplementation, column 405 is measured in terms of queue occupancy. Inother implementations, column 405 is measured in terms of a number ofhints received, a number of events detected, or otherwise. In oneimplementation, column 410 includes entries for different amount of timeremaining until the next video synchronization signal.

In one implementation, a software driver performs a lookup of table 400using a number of rendering tasks and an amount of time remaining untilthe next video synchronization signal. A performance setting isretrieved from the matching entry if the lookup results in a hit. If thelookup results in a miss, then the software driver can interpolate aperformance setting value based on the two closest entries. Afterretrieving and/or calculating a particular performance setting, thesoftware driver causes the rendering hardware to operate at theparticular performance setting. In one implementation, the softwaredriver performs multiple lookups per frame to table 400 to update theperformance setting as the number of rendering tasks and/or an amount oftime remaining changes during the frame period.

In one implementation, there is a separate table 400 for each differentapplication that could run on the system. For example, for a cloudgaming environment, a first table 400A is stored by the system. For avirtual reality application, a second table 400B is stored by thesystem. Any number of other tables 400C-N can also be stored by thesystem for different applications. Each application can have differentcharacteristics and complexity for the rendering tasks likely to beperformed when executing the application. Accordingly, each applicationhas a separate table 400 to accommodate the different performancesettings that should be used based on the number of rendering tasks andtime remaining.

In one implementation, each table 400 is programmed by software. Thetables 400 can be programmed based on test data and/or the tables 400can be programmed based on real-time training based on monitoring theapplication's behavior. For example, in one implementation, table 400 isprogrammed by software with default values for a given application.Then, during runtime, software can monitor the given application to seeif any changes have been observed in the run-time environment ascompared to the test scenarios that were used to generate the defaultvalues for table 400. If rendering tasks are taking longer thanpredicted, or if rendering tasks are finished sooner than predicted, thevalues stored in performance setting column 415 can be updated to moreaccurately reflect the given application's behavior. In anotherimplementation, rather than using table 400 to select a performancesetting, the software driver uses a formula for calculating theperformance setting based on a number of incoming tasks and timeremaining. In other implementations, the software driver uses othersuitable techniques for selecting the performance setting.

Turning now to FIG. 5, one implementation of a method 500 for performingreal-time video rendering with performance guaranteed power managementis shown. For purposes of discussion, the steps in this implementationand those of FIG. 6 are shown in sequential order. However, it is notedthat in various implementations of the described methods, one or more ofthe elements described are performed concurrently, in a different orderthan shown, or are omitted entirely. Other additional elements are alsoperformed as desired. Any of the various systems or apparatusesdescribed herein are configured to implement method 500.

A software driver monitors inputs that corresponds to rendering tasksfor a current frame being rendered (block 505). In one implementation,the inputs are events associated with a user on a network. For example,the user is playing a video game in a cloud-gaming scenario. In anotherimplementation, the inputs are user movements in a virtual realityenvironment. In other implementations, other types of inputs for othertypes of scenarios are received in block 505. Also, the software drivermonitors the amount of time remaining in the current frame period untilthe next video synchronization signal (block 510). Next, the softwaredriver determines the lowest possible performance setting for completingthe incoming rendering tasks in the amount of time remaining until thenext video synchronization signal (block 515). In one implementation,the performance setting for the processing hardware (e.g., GPU) includescorresponding voltage and frequency values.

If the software driver determines that the incoming rendering taskscannot be completed in the amount of time remaining until the next videosynchronization signal even at the maximum performance setting(conditional block 520, “no” leg), then the software driver causes theprevious frame to be replayed and the processing hardware is set to idleor to a relatively low performance setting (e.g., a lowest possibleperformance setting) (block 525). Alternatively, the software driver cancause the current frame to be delayed in block 525 rather than replayingthe previous frame. If there is a performance setting that allows theincoming rendering tasks to be completed in the time remaining until thenext Vsync (conditional block 520, “yes” leg), then the software drivercauses the processing hardware to operate at a given performance setting(block 530). In one implementation, the given performance setting is thelowest possible performance setting for completing the incomingrendering tasks in the amount of time remaining until the next videosynchronization signal. In another implementation, the given performancesetting is one setting higher than the lowest possible performancesetting to provide a margin of error for completing the incomingrendering tasks in the amount of time remaining until the next videosynchronization signal. In other implementations, the margin of errorcan be increased to two or more settings higher than the lowest possibleperformance setting.

If rendering of the current frame has not finished (conditional block540, “no” leg), then after some amount of time elapses, or after someevent (i.e., a change in queue occupancy) is detected, method 500returns to block 505. It is noted that some hysteresis can be added tothe loop to prevent the performance setting from being changed toofrequently. If rendering of the current frame has finished (conditionalblock 540, “yes” leg), then method 500 ends. It is noted that method 500can be performed for each video frame of a video sequence beingrendered.

Turning now to FIG. 6, one implementation of a method 600 forcontrolling the performance setting for processing hardware based onapplication type is shown. A control unit determines which applicationis currently being executed by the system (block 605). Then, the controlunit loads a performance setting lookup table (e.g., table 400 of FIG.4) that corresponds to the application (block 610). Next, the controlunit uses the table to select a performance setting for the processinghardware based on queue occupancy and an amount of time remaining untilthe next video synchronization signal (block 615). If the control unitdetects a different application being executed by the system(conditional block 620, “yes” leg), then method 600 returns to block610. Otherwise, if the given application continues to be executed by thesystem (conditional block 620, “no” leg), then method 600 returns toblock 615. It is noted that in some cases, a single application can havemultiple different performance setting lookup tables. For example, avideo game application can have different scenes with different amountof rendering complexity. For a first scene of the application, thecontrol unit can load a first table, for a second scene, the controlunit loads a second table, and so on.

In various implementations, program instructions of a softwareapplication are used to implement the methods and/or mechanismsdescribed herein. For example, program instructions executable by ageneral or special purpose processor are contemplated. In variousimplementations, such program instructions can be represented by a highlevel programming language. In other implementations, the programinstructions can be compiled from a high level programming language to abinary, intermediate, or other form. Alternatively, program instructionscan be written that describe the behavior or design of hardware. Suchprogram instructions can be represented by a high-level programminglanguage, such as C. Alternatively, a hardware design language (HDL)such as Verilog can be used. In various implementations, the programinstructions are stored on any of a variety of non-transitory computerreadable storage mediums. The storage medium is accessible by acomputing system during use to provide the program instructions to thecomputing system for program execution. Generally speaking, such acomputing system includes at least one or more memories and one or moreprocessors configured to execute program instructions.

It should be emphasized that the above-described implementations areonly non-limiting examples of implementations. Numerous variations andmodifications will become apparent to those skilled in the art once theabove disclosure is fully appreciated. It is intended that the followingclaims be interpreted to embrace all such variations and modifications.

What is claimed is:
 1. A system comprising: one or more processingelements; a power management unit; and a control unit configured to:monitor inputs representative of how many rendering tasks are waiting tobe performed for a current frame being rendered; monitor an amount oftime remaining until a next video synchronization signal; select a givenperformance setting for allowing one or more rendering tasks to becompleted in the amount of time remaining until the next videosynchronization signal while also achieving a power consumption target;and convey an indication of the given performance setting to the powermanagement unit to cause the one or more processing elements to operateat the given performance setting.
 2. The system as recited in claim 1,wherein the control unit is configured to increase the given performancesetting responsive to determining the inputs indicate that morerendering tasks have been queued while the amount of time remaininguntil the next video synchronization signal has decreased.
 3. The systemas recited in claim 1, wherein the control unit is configured to selecta lowest possible performance setting which allows the one or morerendering tasks to be completed in the amount of time remaining untilthe next video synchronization signal.
 4. The system as recited in claim1, wherein the control unit is configured to map a combination of anumber of rendering tasks and the amount of time remaining to the givenperformance setting.
 5. The system as recited in claim 4, wherein thecontrol unit is configured to maintain a table for mapping thecombination of the number of rendering tasks and the amount of timeremaining to the given performance setting, wherein each entry in thetable maps a given number of rendering tasks and a given amount of timeremaining to a corresponding performance setting.
 6. The system asrecited in claim 1, wherein the inputs comprise user movements in avirtual reality environment.
 7. The system as recited in claim 1,wherein the control unit is further configured to: cause the one or moreprocessing elements to operate at a relatively high performance settingresponsive to determining that the number of rendering tasks is greaterthan a threshold, wherein the threshold is based on the amount of timeremaining until the next video synchronization signal; and cause the oneor more processing elements to operate at a relatively low performancesetting responsive to determining that the number of rendering tasks isless than or equal to the threshold.
 8. A method comprising: monitoring,by a control unit, inputs representative of how many rendering tasks arewaiting to be performed for a current frame being rendered; monitoringan amount of time remaining until a next video synchronization signal;selecting a given performance setting for allowing one or more renderingtasks to be completed in the amount of time remaining until the nextvideo synchronization signal while also achieving a power consumptiontarget; and conveying an indication of the given performance setting toa power management unit to cause one or more processing elements tooperate at the given performance setting.
 9. The method as recited inclaim 8, further comprising increasing the given performance settingresponsive to determining the inputs indicate that more rendering taskshave been queued while the amount of time remaining until the next videosynchronization signal has decreased.
 10. The method as recited in claim8, further comprising selecting a lowest possible performance settingwhich allows the one or more rendering tasks to be completed in theamount of time remaining until the next video synchronization signal.11. The method as recited in claim 8, further comprising mapping acombination of a number of rendering tasks and the amount of timeremaining to the given performance setting.
 12. The method as recited inclaim 11, further comprising maintaining a table for mapping thecombination of the number of rendering tasks and the amount of timeremaining to the given performance setting, wherein each entry in thetable maps a given number of rendering tasks and a given amount of timeremaining to a corresponding performance setting.
 13. The method asrecited in claim 8, wherein the inputs comprise user movements in avirtual reality environment.
 14. The method as recited in claim 8,further comprising: causing the one or more processing elements tooperate at a relatively high performance setting responsive todetermining that the number of rendering tasks is greater than athreshold, wherein the threshold is based on the amount of timeremaining until the next video synchronization signal; and causing theone or more processing elements to operate at a relatively lowperformance setting responsive to determining that the number ofrendering tasks is less than or equal to the threshold.
 15. An apparatuscomprising: a first processor; a second processor; and a memory storingprogram instructions, wherein the program instructions are executable bythe first processor to: monitor inputs representative of how manyrendering tasks are waiting to be performed for a current frame beingrendered; monitor an amount of time remaining until a next videosynchronization signal; select a given performance setting for allowingone or more rendering tasks to be completed in the amount of timeremaining until the next video synchronization signal while alsoachieving a power consumption target; and cause the second processor tooperate at the given performance setting.
 16. The apparatus as recitedin claim 15, wherein the program instructions are executable by thefirst processor to increase the given performance setting responsive todetermining the inputs indicate that more rendering tasks have beenqueued while the amount of time remaining until the next videosynchronization signal has decreased.
 17. The apparatus as recited inclaim 15, wherein the program instructions are executable by the firstprocessor to select a lowest possible performance setting which allowsthe one or more rendering tasks to be completed in the amount of timeremaining until the next video synchronization signal.
 18. The apparatusas recited in claim 15, wherein the program instructions are executableby the first processor to map a combination of a number of renderingtasks and the amount of time remaining to the given performance setting.19. The apparatus as recited in claim 18, wherein the programinstructions are executable by the first processor to maintain a tablefor mapping the combination of the number of rendering tasks and theamount of time remaining to the given performance setting, wherein eachentry in the table maps a given number of rendering tasks and a givenamount of time remaining to a corresponding performance setting.
 20. Theapparatus as recited in claim 15, wherein the program instructions areexecutable by the first processor to: cause the second processor tooperate at a relatively high performance setting responsive todetermining that the number of rendering tasks is greater than athreshold, wherein the threshold is based on the amount of timeremaining until the next video synchronization signal; cause the secondprocessor to operate at a relatively low performance setting responsiveto determining that the number of rendering tasks is less than or equalto the threshold.